Huffman Coding as a Non-linear Dynamical System 
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In this paper, source coding or data compression is viewed as a measurement problem. Given 
a measurement device with lewer states than the observable of a stochastic source, how can one 
capture the essential information? We propose modeling stochastic sources as piecewise linear 
discrete chaotic dynamical systems known as Generalized Luroth Series (GLS) which dates back to 
Georg Cantor's work in 1869. The Lyapunov exponent of GLS is equal to the Shannon's entropy 
of the source (up to a constant of proportionahty). By successively approximating the source with 
GLS having fewer states (with the closest Lyapunov exponent), we derive a binary coding algorithm 
which exhibits minimum redundancy (the least average codeword length with integer codeword 
lengths). This turns out to be a re-discovery of Huffman coding, the popular lossless compression 
algorithm used in the JPEG international standard for still image compression. 
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I. SOURCE CODING SEEN AS A 
MEASUREMENT PROBLEM 



A practical problem an experimental physicist would 
face is the following - a process (eg. a particle moving in 
space) has an observed variable (say position of the par- 
ticle) which potentially takes N distinct values, but the 
measuring device is capable of recording only AI values 
and M < N. In such a scenario (Figure [1]), how can we 
make use of these M states of the measuring device to 
capture the essential information of the source? It may 
be the case that N takes values from an infinite set, but 
the measuring device is capable of recording only a finite 
number of states. However, it shall be assumed that N 
is finite but allowed for the possibility that N ^ M (for 
e.g., it is possible that N = 10^ and M = 2). 

Our aim is to capture the essential information of the 
source (the process is treated as a source and the observa- 
tions as messages from the source) in a lossless fashion. 
This problem actually goes all the way back to Shan- 
non [l|] who gave a mathematical definition for the infor- 
mation content of a source. He defined it as 'Entropy', 
a term borrowed from statistical thermodynamics. Fur- 
thermore, his now famous noiseless source coding theo- 
rem states that it is possible to encode the information 
of a memoryless source (assuming that the observables 
are independent and identically distributed (i.i.d)) using 
(at least) H{X) bits per symbol, where H{X) stands for 
the Shannon's entropy of the source X . Stated in other 
words, the average codeword length Upi < H{X) 
where U is the length of the i-th codeword and pi the cor- 
responding probability of occurrence of the i-th alphabet 
of the source. 







Measuring 
Device 
M symbols 


, N 


Process 
(Source) 


N states , 




codewords 



* Elec tronic address: |nithin'nagaraj@yahoo.com| 

URL: http : //nlthln . nagaraj . googlepages ■ com | 



FIG. 1: Source coding as a measurement problem. Typically, 
M <g A'^. If M — 2, we are seeking binary codes. 



Shannon's entropy H{X) defines the ultimate limit for 
lossless data compression. Data compression is a very im- 
portant and exciting research topic in Information the- 
ory since it not only provides a practical way to store 
bulky data, but it can also be used effectively to measure 
entropy, estimate complexity of sequences and provide 
a way to generate pseudo-random numbers (which 
are necessary for Monte-Carlo simulations and Crypto- 
graphic protocols). 

Several researchers have investigated the relationship 
between chaotic dynamical systems and data compres- 
sion (more generally between chaos and information the- 
ory). Jimenez-Montaho, Ebeling, and others [sl] have pro- 
posed coding schemes by a symbolic substitution method. 
This was shown to be an optimal data compression al- 
gorithm by Grassberger [jl and also to accurately es- 
timate Shannon's entropy Q and Lyapunov exponents 
of dynamical systems [5|. Arithmetic coding, a popular 
data compression algorithm used in JPEG2000 was re- 
cently shown to be a specific mode of a piecewise linear 
chaotic dynamical system Q. In another work [rj, we 
have used symbolic dynamics on chaotic dynamical sys- 
tems to prove the famous Kraft-McMillan inequality and 
its converse for prefix-free codes, a fundamental inequal- 
ity in source coding, which also has a Quantum analogue. 

In this paper, we take a nonlinear dynamical systems 
approach to the aforementioned measurement problem. 
We are interested in modeling the source by a nonlin- 
ear dynamical system. By a suitable model, we hope 
to capture the information content of the source. This 
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paper is organized as follows. In Section II, stochastic 
sources are modeled using piecewise linear chaotic dy- 
namical systems which exhibits some important and in- 
teresting properties. In Section III, we propose a new 
algorithm for source coding and prove that it achieves 
the least average codeword length and turns out to be a 
re-discovery of Huffman coding ^ - the popular lossless 
compression algorithm used in the JPEG international 
standard [oj for still image compression. We make some 
observations about our approach in Section IV and con- 
clude in Section V. 



II. SOURCE MODELING USING PIECEWISE 
LINEAR CHAOTIC DYNAMICAL SYSTEMS 

We shall consider stationary sources. These are de- 
fined as sources whose statistics remain constant with 
respect to time (lOj . These include independent and iden- 
tically distributed (i.i.d) sources and Ergodic (Markov) 
sources. These sources are very important in modeling 
various physical/chemical/biological phenomena and in 
engineering applications 

On the other hand, non-stationary sources are those 
whose statistics change with time. We shall not deal with 
them here. However, most coding methods are applicable 
to these sources with some suitable modifications. 



A. Embedding an i.i.d Source using Generalized 
Luroth Series 

Consider an i.i.d source X (treated as a random vari- 
able) which takes values from a set of N values A = 
{ai, 02, . . . , aAf} with probabilities {pi,p2, . ■ . ^pn} re- 
spectively with the condition Pi = 1. 

An i.i.d source can be simply modeled as a (memory- 
less) Markov source (or Markov process fl^) with the 
transition probability from state i to j as being indepen- 
dent of state i (and all previous states) [l3| . We can then 
embed the Markov source into a dynamical system as fol- 
lows: to each Markov state (i.e. to each symbol in the 
alphabet), associate an interval on the real line segment 
[0, 1) such that its length is equal to the probability. Any 
two such intervals have pairwise disjoint interiors and the 
union of all the intervals cover [0,1). Such a collection 
of intervals is known as a partition. We define a deter- 
ministic map T on the partitions such that they form a 
Markov partition (they satisfy the property that the im- 
age of each interval under T covers an integer number of 
partitions [12]). 

The simplest way to define the map T such that the 
intervals form a Markov partition is to make it linear and 
surjective. This is depicted in Figure [2)Ja). Such a map 
is known as Generalized Luroth Series (GLS). There are 
other ways to define the map T (for eg., see [11|) but for 
our purposes GLS will suffice. Luroth's paper in 1883 
(see reference in Dajani et. al. [iBl) deals with number 



theoretical properties of Luroth series (a specific case of 
GLS). However, Georg Cantor had discovered GLS ear- 
lier in 1869 [11 [H. 
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(b) 2^ modes of GLS. 



FIG. 2: Embedding an i.i.d source into a Generalized Luroth 
Series (GLS). 



B. Some Important Properties of GLS 

A list of important properties of GLS is given below: 

1. GLS preserves the Lebesgue (probability) measure. 

2. Every (infinite) sequence of symbols from the al- 
phabet corresponds to an unique initial condition. 

3. The symbolic sequence of every initial condition is 
i.i.d. 

4. GLS is Chaotic (positive Lyapunov exponent, pos- 
itive Topological entropy). 

5. GLS has maximum topological entropy (= ln{N)) 
for a specified number of alphabets (N). Thus, all 
possible arrangements of the alphabets can occur 
as symbolic sequences. 

6. GLS is isomorphic to the Shift map and hence Er- 
godic (Bernoulli). 
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7. Modes of GLS: As it can be seen from Figure[2Ub), 
the slope of the hne that maps each interval to [0, 1) 
can be chosen to be either positive or negative. 
These choices result in a total of 2^ modes of GLS 
(up to a permutation of the intervals along with 
their associated alphabets for each mode, these are 
N\ in number). 

It is property 2 and 3 that allow a faithful "embed- 
ding" of a stochastic i.i.d source. For aproof of these 
properties, please refer Dajani et. al. [IJ]. Some well 
known GLS are the standard Binary map and the stan- 
dard Tent map shown in Figure [31 
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(a) (b) 

FIG. 3: Some well known GLS: (a) Standard Binary map 
{x t-^ 2x, < X < 0.5; x ^ 2x ~ I, 0.5 < x < 1). (b) 
Standard Tent map {x 2x, < x < 0.5; x t-^ 2 — 2x, 
0.5 < X < 1). 



C. Lyapunov Exponent of GLS = Shannon's 
Entropy 



i=N 

X — — Pilog2{pi)- (almost everywhere) (3) 

This turns out to be equal to Shannon's entropy of the 
i.i.d source X. Thus Lyapunov exponent of the GLS that 
faithfully embeds the stochastic i.i.d source X is equal to 
the Shannon's entropy of the source. Lyapunov exponent 
can be understood as the amount of information in bits 
revealed by the symbolic sequence (measurement) of the 
dynamical system in every iteration (isj . It can be seen 
that the Lyapunov exponent for all the modes of the GLS 
are the same. The Lyapunov exponent for binary i.i.d 
sources is plotted in Figure [J] as a function of p (the 
probability of symbol '0'). 
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FIG. 4: Lyapunov Exponent; \{p) = — plog2(p) — (1 — 
p)log2(l — p) in units of bits/iteration plotted against p for 
binary i.i.d sources. The maximum occurs at p = |. Note 
that A(p) = A(l -p). 



It is easy to verify that GLS preserves the Lebesgue 
measure. A probability density n(a;) on [0,1) is invariant 
under the given transformation T, if for each interval 
[c, d] C [0, 1), we have: 



J n{x)dx = J U{x)dx. (1) 

where S = T-^{[c, d]) = {x\c < T{x) < d}. 

For the GLS, the above condition has constant proba- 
bility density on [0, 1) as the only solution. It then follows 
from Birkhoff's ergodic theorem that the asymptotic 
probability distribution of the points of almost every tra- 
jectory is uniform. We can hence calculate Lyapunov 
exponent as follows: 



A = / log2{\T' {x)\)IV{x)dx. (almost everywhere) (2) 

^0 

Here, we measure A in bits/iteration. 

n(x) is uniform with value 1 on [0,1) and T'{x) = 
constant since T{x) is linear in each of the intervals, the 
above expression simplifies to: 



III. SUCCESSIVE SOURCE APPROXIMATION 
USING GLS 

In this section, we address the measurement prob- 
lem proposed in Section [H Throughout our analysis, 
N > 2 (finite) and M = 2 is assumed. We are seeking 
minimum-redundancy binary symbol codes. "Minimum- 
redundancy" is defined as follows [1|: 

Definition 1 (Minimum Redundancy) A bi- 
nary symbol code C — {ci, C2, . . . , cat} with lengths 
L — {li,l2, ■ ■ ■ ,In} for the i.i.d source X with alpha- 
bet A — {ai, 02, . . . , On} with respective probabilities 
{pi,P2, ■ ■ ■ ,Pn} is said to have minimum-redundancy if 
Lc{X) — X^iLi ^iPi ^s minimum. 

For N — 2, the minimum-redundancy binary symbol 
code for the alphabet A = {01,02} is C = {0, 1} (ai 1— > 0, 
02 I— > 1). The goal of source coding is to minimize Lc{X), 
the average code- word length of C, since this is important 
in any communication system. As we mentioned before, 
it is always true that Lc{X) > H{X) 
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A. Successive Source Approximation Algorithm 
using GLS 

Our approach is to approximate the original i.i.d source 
(GLS with N partitions) with the best GLS with a re- 
duced number of partitions (reduced by 1). For the sake 
of notational convenience, we shall term the original GLS 
as order N (for original source Xn) and the reduced GLS 
would be of order A^— 1 (for approximating source Xn-i)- 
This new source Xjv-i is now approximated further with 
the best possible source of order A^ — 2 {Xn-2)- This pro- 
cedure of successive approximation of sources is repeated 
until we end up with a GLS of order M = 2 (X2). It 
has only two partitions for which we know the minimum- 
redundancy symbol code is C = {0, 1}. 

At any given stage q of approximation, the easiest way 
to construct a source of order g — 1 is to merge two of the 
existing q partitions. What should be the rationale for 
determining which is the best q — 1 order approximating 
source Xq^i for the source Xg7 

Definition 2 (Best Approximating Source) 

Among all possible q — 1 order approximating sources, 
the best approximation is the one which minimizes the 
following quantity: 

A = X{X,)-X{X,_,). (4) 

where A(-) is the Lyapunov exponent of the argument. 
The reason behind this choice is intuitive. We have al- 
ready established that the Lyapunov exponent is equal 
to the Shannon's entropy for the GLS and that it rep- 
resents the amount of information (in bits) revealed by 
the symbolic sequence of the source at every iteration. 
Thus, the best approximating source should be as close 
as possible to the original source in terms of Lyapunov 
exponent. 

There are three steps to our algorithm for finding min- 
imum redundancy binary symbol code as given below 
here: 



Algorithm 1 Successive Source Approximation using 
GLS 

1. Embed the i.i.d source X in to a GLS with A'^ partitions 
as described in III Al Initialize K = N. The source is 
denoted by Xk to indicate order K. 

2. Approximate source Xk with a GLS with K — 1 parti- 
tions by merging the smallest two partitions to obtain 
the source Xk-i of order K ~ 1. K ^ K — 1. 

3. Repeat step 2 until order of the GLS is 2 {K = 2), then, 
stop. 



We shall prove that the approximating source which 
merges the two smallest partitions is the best approx- 
imating source. It shall be subsequently proved that 
this algorithm leads to minimum- redundancy, i.e., it 
minimizes Lc{X). Assigning codewords to the alphabets 



will also be shown. 

Theorem 1: (Best Successive Source Ap- 
proximation) For a source Xm which takes val- 
ues from {Al, A2, ■ . ■ Am~i, Am} with probabil- 
ities {pi,P2t ■ ■ -Pm-ItPai} respectively and with 

1 > Pi > ^2 > • • ■ > PM^i >Pm>0 Pi = 

source Xm-i which is the best M-1 order approximation 
to Xm has probabilities {pi,p2, ■ ■ ■Pm~2,Pm~i +Pm}- 

Proof: 

By induction on M. For M ~ 1 and M = 2, there is 
nothing to prove. We will first show that the statement 
is true for M = 3. 

• M = S. X3 takes values from {ai, a2, a^} 
with probabilities {pi,P2iP3} respectively and 
1 > pi > P2 > P3 > with pi + P2 + P3 = 1 • 

We need to show that X2 which takes values 
from {ai,Z} with probabilities {pi,P2 + Ps} 
is the best 2-order approximation to X3. Here 
Z is a symbol that represents the merged partition. 

This means, that we should show that this is 
better than any other 2-order approximation. 
There are two other 2-order approximations, 
namely, Y2 which takes values from {a^jZ} with 
probabilities {p3,P2 + Pi} and W2 which takes 
values from {02, Z} with probabilities {p2,Pi+P3}- 

This implies that we need to show XiX^) — X(X2) < 
X{X3)~X{Y2) andA(X3)-A(X2) < X{X3)-X{W2). 

• We shaU prove A(A:3) - X{X2) < XiXs) - X{Y2). 

This means that we need to prove X(X2) > A(l2)- 
This means we need to show —pilog2{pi) — {p2 + 
P3)log2{p2 +P3) > -P3log2{p3) - {Pi +P2)log2iPi + 
P2)- We need to show the following: 

-Pilog2{pi) - (1 -^1)^032(1 -Pi) > -P3log2{P3) 

-(1 -p3)log2{l ~P3) 
^ A2(pi) > A2(P3)- 

There are two cases. Ifpi < 0.5, then since p3 < pi, 
^2(^1) > A2(p3)- If Pi > 0.5, then since P2 + Ps = 
I — Pi, we have p^ < 1 — pi. This again implies 
-^2(^1) > X2{p3). Thus, we have proved that X2 is 
better than ¥2- 

• We can follow the same argument to prove that 
A(^2) > A(W2)- Thus, we have shown that the 
theorem is true for M — 3. An illustrated example 
is given in Figure [51 

• Induction hypothesis: Assume that the theorem is 
true for M = fc, we need to prove that this implies 
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that the theorem is true for M = k + 1. 

Let Xk+i have the probabihty distribution 
{pi,P2, ■ ■ .pk,Pk+i}- Let us assume that pi I (if 
this is the case, there is nothing to prove). This 
means that 1 — pi > 0. Divide aU the probabihties 
by (1 -pi) to get ^ . . . f^}. 

Consider the set j^^, t% • • ■ T^, f^}- This 
represents a probabihty distribution of a source 
with k possible values and we know that the 
theorem is true for M ~ k. 

This means that the best source approximation for 
this new distribution is a source with probability 
distribution {-r^, . . . £i:+£i±i}. 

In other words, this means: 

^ i-pi i-pi i-pi i-pi 

> 

- V (i no92[- )-(- )log2{- ). 

where r and s are both different from k and /c + 1 . 
Multiply on both sides by (1 — pi) and simplify to 
get: 

fe-i 

-'^Pilog2{pi) - (pfc + Pk+i)log2{Pk +Pk+i) > 

i=2 
k+1 

- ^ Pilog2{Pi) - {Pr +Ps)log2{Pr +Ps)- 
i—2.i^r,i^s 

Add the term —pilog2{pi) > on both sides and we 
have proved that the best fc-order approximation 
to Xk+i is the source Xk, where symbols with the 
two least probabilities are merged together. We 
have thus proved the theorem. □ 



B. Codewords are Symbolic Sequences 

At the end of Algorithmfl] we have order-2 approxima- 
tion (X2). We allocate the code C2 — {0, 1} to the two 
partitions. When we go from X2 to X3, the two sibling 
partitions that were merged to form the parent partition 
will get the codes '5*0' and '51' where 'S" is the codeword 
of the parent partition. This process is repeated until we 
have allocated codewords to X^. 




0.7 0.2 0.1 

(a) Source X: {A,B,C} with probabilities {0.7, 0.2, 0.1}, 
\x = 1.156. 




0.8 0.2 

(d) A and C merged {Xac = 0.73). 

FIG. 5: Successive source approximation using GLS: An ex- 
ample. Here, \bc is the closest to Xx- Unit of A(.) is 
bits/iteration. 



It is interesting to realize that the codewords are 
actually symbolic sequences on the standard binary 
map. By allocating the code C2 — {0,1} to X2 we are 
essentially treating the two partitions to have equal 
probabilities although they may be highly skewed. In 
fact, we are approximating the source X2 as a GLS with 
equal partitions (=0.5 each) which is the standard binary 
map. The code C'2 is thus the symbolic sequence on the 
standard binary map. Now, moving up from X2 to X3 
we are doing the same approximation. We are treating 
the two sibling partitions to have equal probabilities 
and giving them the codes '50' and 'SV which are 
the symbolic sequences for those two partitions on the 
standard binary map. Continuing in this fashion, we 
see that all the codes are symbolic sequences on the 
standard binary map. Every alphabet of the source X is 
approximated to a partition on the binary map and the 
codeword allocated to it is the corresponding symbolic 
sequence. It will be proved that the approximation is 
minimum redundancy and as a consequence of this, if the 
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probabilities are all powers of 2, then the approximation 
is not only minimum redundancy but also equals the 
entropy of the source {Lc{X) — H{X)). 

Theorem 2: (Successive Source Approximation) 

The successive source approximation algorithm using 
GLS yields minimum-redundancy (i.e., it minimizes 
Lc{X)). 

Proof: 

We make the important observation that the successive 
source approximation algorithm is in fact a re-discovery 
of the binary Huffman coding algorithm Q which is 
known to minimize Lc{X) and hence yields minimum- 
redundancy. Since our algorithm is essentially a re- 
discovery of the binary Huffman coding algorithm, the 
theorem is proved (the codewords allocated in the previ- 
ous section are the same as Huffman codes). □ 

C. Encoding and Decoding 

We have described how by successively approximating 
the original stochastic i.i.d source using GLS, we arrive 
at a set of codewords for the alphabet which achieves 
minimum redundancy. The assignment of symbolic se- 
quences as codewords to the alphabet of the source is 
the process of encoding. Thus, given a series of obser- 
vations of X, the measuring device represents and stores 
these as codewords. For decoding, the reverse process 
needs to be applied, i.e., the codewords have to be re- 
placed by the observations. This can be performed by 
another device which has a look-up table consisting of 
the alphabet set and the corresponding codewords which 
were assigned originally by the measuring device. 

IV. SOME REMARKS 

We make some important observations/remarks here: 

1. The faithful modeling of a stochastic i.i.d source as 
a GLS is a very important step. This ensured that 
the Lyapunov exponent captured the information 
content (Shannon's Entropy) of the source. 

2. Codewords are symbolic sequences on GLS. We 
could have chosen a different scheme for giving 
codewords than the one described here. For exam- 
ple, we could have chosen symbolic sequences on 
the Tent map as codewords. This would also corre- 
spond to a different set of Huffman codes, but with 
the same average codeword length Lc{X). Huff- 



man codes are not unique but depend on the way 
we assign codewords at every level. 

3. Huffman codes are symbol codes, i.e., each symbol 
in the alphabet is given a distinct codeword. We 
have investigated binary codes in this paper. An 
extension to the proposed algorithm is possible for 
ternary and higher bases. 

4. In another related work, we have used GLS to de- 
sign stream codes. Unlike symbol codes, stream 
codes encode multiple symbols at a time. There- 
fore, individual symbols in the alphabet no longer 
correspond to distinct codewords. By treating the 
entire message as a symbolic sequence on the GLS, 
we encode the initial condition which contains the 
same information. This achieves optimal lossless 
compression as demonstrated in [l6i |. 

5. We have extended GLS to piecewise non-linear, yet 
Lebesgue measure preserving discrete chaotic dy- 
namical systems. These have very interesting prop- 
erties (such as Robust Chaos in two parameters) 
and are useful for joint compression and encryption 
applications (l6j . 

V. CONCLUSIONS 

Source coding problem is motivated as a measurement 
problem. A stochastic i.i.d source can be faithfully "em- 
bedded" into a piecewise linear chaotic dynamical sys- 
tem (GLS) which exhibits interesting properties. The 
Lyapunov exponent of the GLS is equal to Shannon's en- 
tropy of the i.i.d source. The measurement problem is 
addressed by successive source approximation using GLS 
with the nearest Lyapunov exponent (by merging the two 
least probable states). By assigning symbolic sequences 
as codewords, we re-discovered the popular Huffman cod- 
ing algorithm - a minimum redundancy symbol code for 
i.i.d sources. 
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